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Abstract:  When  we  look  at  images,  certain  salient  structures  often  attract  our  im¬ 
mediate  attention  without  requiring  a  systematic  scan  of  the  entire  image.  In  subsequent 
stages,  processing  resources  can  be  allocated  preferentially  to  these  salient  structures.  In 
many  cases  this  saliency  is  a  property  of  the  structure  as  a  whole,  i.e.,  parts  of  the  structure 
are  not  salient  in  isolation.  In  this  paper  we  present  a  saliency  measure  based  on  curvature 
and  curvature  variation.  The  structures  this  measure  emphasizes  are  also  salient  in  human 
perception,  and  they  often  correspond  to  objects  of  interest  in  the  image.  We  present  a 
method  for  computing  the  saliency  by  a  simple  iterative  scheme,  using  a  uniform  network  of 
locally  connected  processing  elements.  The  network  uses  an  optimization  approach  to  pro¬ 
duce  a  “saliency  map,”  which  is  a  representation  of  tKelmage  emphasizing  salient  locations. 
The  main  properties  of  the  network  are:  (i)  the  computations  are  simple  and  local,  (ii)  glob¬ 
ally  salient  structures  emerge  with  a  small  number  of  iterations,  and  (iii)  as  a  by-product 
of  the  computations,  contours  are  smoothed  and  gaps  are  filled  in. 
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Abstract:  When  we  look  at  images,  certain  salient  structures  often  attract  our  im¬ 
mediate  attention  without  requiring  a  systematic  scan  of  the  entire  image.  In  subsequent 
stages,  processing  resources  can  be  allocated  preferentially  to  these  salient  structures.  In 
many  cases  this  saliency  is  a  property  of  the  structure  as  a  whole,  i.e.,  parts  of  the  structure 
are  not  salient  in  isolation.  In  this  paper  we  present  a  saliency  measure  based  on  curvature 
and  curvature  variation.  The  structures  this  measure  emphasizes  are  also  salient  in  human 
perception,  and  they  often  correspond  to  objects  of  interest  in  the  image.  We  present  a 
method  for  computing  the  saliency  by  a  simple  iterative  scheme,  using  a  uniform  network  of 
locally  connected  processing  elements.  The  network  uses  an  optimization  approach  to  pro¬ 
duce  a^Saliency  mapC^which  is  a  representation  of  the  image  emphasizing  salient  locations. 
The  main  properties  of  the  network  are:  (i)  the  computations  are  simple  and  local,  (ii)  glob¬ 
ally  salient  structures  emerge  with  a  small  number  of  iterations,  and  (iii)  as  a  by-product 
of  the  computations,  contours  are  smoothed  and  gaps  are  filled  in. 
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§1  Introduction 

Salient  structures  can  often  be  perceived  in  an  image  at  a  glance.  They  appear 
to  attract  our  attention  without  the  need  to  scan  the  entire  image  in  a  systematic 
manner,  and  without  prior  expectations  regarding  their  shape.  The  processes  involved 
in  the  perception  of  salient  structures  appear  to  play  a  useful  role  in  segmentation  and 
recognition,  since  they  allow  us  to  immediately  concentrate  on  objects  of  interest  in  the 
image. 

Consider  the  images  in  figures  1,  2  and  3.  Certain  objects  in  each  image  somehow 
attract  our  attention  in  a  maimer  often  described  as  ‘preattentive’.  For  instance,  the 
large  blobs  in  Fig.  la  and  16  are  prominent,  although  locally  the  blobs’  contours  are 
indistinguishable  from  background  contours  on  the  basis  of  local  orientation,  curvature, 
contrast,  etc.  It  seems  as  if  one  must  somehow  capture  most  of  the  curve  bounding  a  blob 
in  order  to  perceive  it  as  a  prominent  structure.  The  circle  in  Fig.  2  is  immediately 
perceived  although  its  contour  is  fragmented,  implying  that  gaps  do  not  hinder  the 
immediate  perception  of  such  objects.  In  this  case  one  must  group  together  several 
line  segments  of  the  circle  to  distinguish  it  from  the  background.  These  examples  also 
demonstrate  that  these  prominent  objects  need  not  be  recognized  in  order  for  them  to  be 
distinguished.  The  image  in  Fig.  3  is  an  edge  image  of  a  car  in  a  cluttered  background. 
Our  attention  is  drawn  immediately  to  the  region  of  interest  in  the  image.  It  seems  that 
the  car  need  not  be  recognized  to  attract  our  attention.  When  the  image  is  inverted 
and  presented  for  short  periods,  recognition  becomes  considerably  more  difficult,  yet 
the  same  region  remains  salient. 

The  goal  of  this  paper  is  to  suggest  what  makes  structures  such  as  those  in  Fig. 
1  —  3  salient,  and  to  propose  a  mechanism  for  detecting  salient  locations  in  an  image.  A 
locally  connected  network  is  proposed  that  can  process  images  such  as  the  figures  above 
to  construct  a  “saliency  map”,  which  is  a  representation  of  the  image  emphasizing  salient 
locations.  The  computations  of  the  net  are  devised  to  meet  the  following  requirements: 
(i)  the  time  it  takes  to  detect  a  prominent  structure  does  not  depend  on  the  complexity 
of  background  curves,  (ii)  curves  may  have  any  number  of  gaps,  and  (iii)  the  number  of 
computations  are  restricted  to  the  order  of  dozens  or,  at  most,  about  a  hundred  steps 
in  order  to  meet  the  time  constraint  involved  in  immediate  perception. 

Issues  related  to  this  problem  include  segmentation,  perceptual  organization,  and 
figure/ground  separation.  Segmentation  schemes  have  been  investigated  extensively  in 
the  field  of  computer  vision  and  many  algorithms  have  been  suggested.  They  will  not 
be  reviewed  here,  since  they  are  only  marginally  related  to  the  problem  at  hand.  Many 
of  the  segmentation  processes  that  have  been  proposed  were  more  ambitious  than  what 
is  required,  or  what  is  possible,  to  achieve  in  the  early  stages  where  prominent  areas 
are  located.  For  example,  they  attempt  to  segment  the  entire  image  instead  of  just  an 
area  of  interest.  Our  proposal  is  related  to  the  suggestion  made  by  Ullman  (1986)  that 
segmentation  should  be  conducted  on  an  area  of  interest  rather  than  applied  to  the 


Figure  la.  Three  prominent  blobs  are  perceived  immediately  and  with  little  effort.  Locally,  the 
blobs  are  similar  to  the  background  contours,  (adopted  from  Mahoney  (1986) 

Figure  lb.  Intersections  were  added  to  illustrate  that  the  blobs  are  not  distinguished  by  virtue 
of  their  intersections  with  the  background  curves. 


Figure  2.  A  circle  in  a  background  of  200  randomly  placed  and  oriented  segments  The  circle  is 
still  perceived  immediately  although  its  contour  is  fragmented. 

Figure  3.  An  edge  image  of  a  car  in  a  cluttered  background.  Our  attention  is  drawn  immediately 
to  the  region  of  interest.  It  seems  that  the  car  need  not  be  recognized  to  attract  our 
attention.  The  car  also  remains  salient  when  parallel  lines  and  small  blobs  are  removed, 
and  when  the  less  textured  region  surrounding  parts  of  the  car  is  filled  in  with  more 


texture. 
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entire  image,  implying  that  some  preattentive  process  is  required  to  detect  prominent 
locations  from  which  an  area  of  interest  is  defined,  prior  to  the  act  of  segmentation. 

Lowe’s  (1985)  treatment  ol  perceptual  organization  is  more  closely  related  to  the 
problem  addressed  in  this  paper.  The  processes  proposed  by  Lowe  detect  instances 
of  collinearity,  co- termination  and  parallelism  among  straight  lines,  and  will  not  be 
effective  in  cases  (e.g.  Fig.  1)  where  these  conditions  do  not  play  a  major  role.  Most  past 
approaches  to  segmentation  also  do  not  meet  the  requirements  set  above.  In  particular, 
they  do  not  meet  the  time  constraint  and  they  depend  critically  on  the  complexity  of 
the  background  curves. 


1.1  Local  and  Global  Saliency 


The  phenomena  related  to  the  perception  of  salient  structures  can  be  roughly  di¬ 
vided  into  two  classes.  The  first,  referred  to  as  local  saliency ,  occurs  when  an  element 
becomes  conspicuous  by  having  a  simple  distinguishing  local  property  such  as  color, 
contrast,  orientation,  etc.  For  example,  a  red  item  placed  among  green  ones  immedi¬ 
ately  attracts  attention  by  virtue  of  its  unique  color  (Triesman  and  Galade  1980;  Julesz 
1981).  The  second  case,  referred  to  as  structural  saliency,  occurs  when  the  structure  is 
perceived  in  a  more  global  manner.  That  is,  the  local  elements  of  the  structure  are  not 
salient  as  in  the  former  case  but  instead  the  arrangement  of  the  elements  is  what  makes 
the  structure  unique  and  salient. 

We  focus  below  on  the  saliency  of  curves,  based  on  properties  measured  along 
them  (the  curves  may  be  continuous  or  with  any  number  of  gaps).  Not  all  phenomena 
of  global  immediate  perception  are  necessarily  accounted  for  by  measuring  properties  of 
curves.  For  instance,  one  could  measure  the  compactness  of  a  structure,  the  degree  of 
symmetry  it  contains  and  other  measures  that  are  region-based  rather  than  curve-based. 
Nevertheless,  properties  of  curves  are  often  sufficient  in  order  to  separate  objects  from 
their  background. 

The  fact  that  structural  saliency  requires  measures  that  have  a  global  extent  in¬ 
troduces  a  severe  complexity  problem.  The  number  of  possible  groupings  of  local  line 
segments  into  curves,  where  the  curves  are  allowed  to  have  any  number  of  gaps,  explodes 
exponentially.  The  complexity  issue  becomes  acute  when  considering  the  fact  that  a 
salient  curve  of  a  given  length  is  not  necessarily  composed  of  salient  sub-parts.  Thus, 
contemporary  pyramid  techniques  (see  Rosenfeld  1986  for  a  review)  would  not  be  ap¬ 
propriate  for  detecting  structural  saliency,  because  they  contain  an  implicit  assumption 
that  a  salient  curve  is  composed  of  salient  sub-parts. 
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§2  Measuring  Saliency  as  an  Optimization  Problem 

Our  goal  is  to  construct  a  saliency  map  which  is  a  representation  of  the  image 
emphasizing  salient  locations.  We  seek  to  associate,  therefore,  a  measure  of  saliency 
denoted  by  the  function  <£(•)  to  each  location  in  the  image.  A  property  that  seems  to 
play  a  role  in  structural  saliency  is  the  combination  of  length  and  smoothness  measured 
at  a  particular  scale.  That  is,  a  measure  of  saliency  that  would  account  for  the  type  of 
images  above  is  one  that  favors  long  smooth  curves,  where  the  smoothness  of  a  curve 
is  related  to  its  curvature  or  its  curvature  variation.  We  therefore  face  the  following 
problems: 

(1)  Defining  an  appropriate  measure  $  that,  when  applied  to  a  point  along  a  given 
curve,  will  increase  when  the  curve  increases  in  length  and  smoothness. 

(2)  A  selection  problem.  The  measure  <$(P)  depends  on  the  curve  passing  through 
P.  Since  the  curves  we  are  considering  are  either  continuous  or  separated  by 
any  number  of  gaps,  there  will  usually  be  many  possible  curves  to  consider.  Oui 
approach  to  this  problem  will  be  to  select  the  curve  that  maximizes  $(P)  over 
all  curves  passing  through  P. 

We  defer  the  exact  formulation  of  until  we  have  examined  the  manner  by  which  it 
is  computed.  The  reason  is  that  the  general  method  of  computing  $  (using  a  simple  local 
network)  places  strong  constraints  on  the  possible  definition  of  $.  In  the  next  sections 
we  describe  the  mechanism  by  which  $  is  computed,  and  then  derive  an  explicit  formula 
for  $. 


2.1  The  Basic  Elements 

We  assume  that  $  is  computed  by  a  locally  connected  network  of  processing  el¬ 
ements.  Our  specific  model  is  that  at  the  level  of  computing  saliency  the  image  is 
represented  by  a  network  of  n  x  n  grid  points,  where  each  point  represents  a  specific 
x,y  location  in  the  image.  At  each  point  P  there  are  k  orientation  elements  coming 
into  P  from  neighboring  points,  and  the  same  number  of  orientation  elements  leaving  P 
to  nearby  points.  Each  orientation  element  p,  responds  to  an  input  image  by  signalling 
the  presence  of  thv.  corresponding  line  segment  in  the  image,  so  that  those  elements 
that  do  not  have  an  underlying  line  segment  are  associated  with  an  empty  area  or  gap 
in  the  image.  We  refer  to  a  connected  sequence  of  orientation  elements  p, ,  ....pl+Iv, 
each  element  representing  a  line-segment  or  a  gap,  as  a  curve  of  length  N  (note  that 
curves  may  be  continuous  or  with  any  number  of  gaps).  The  optimization  problem  is 
formulated  as  maximizing  over  all  curves  of  length  N  starting  from  p,: 

max  $,v(p. . Pi+N) 

(Pi  +  l  i -  -.Pi  +  Mf  )€#  '  (Pi  ) 

where  SN(p,)  is  the  set  of  all  possible  curves  of  length  N  starting  from  p,. 
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A  naive  approach  to  this  problem  would  involve  an  exhaustive  enumeration  of  all 
combinations  of  p,+i, ....  p,+yv  which  would  require  an  exponential  search  space  of  size 
kN  for  each  element  in  the  network.  In  what  follows,  we  will  show  that  for  a  certain 
class  of  measures  $  (“extensible”  measures),  the  computation  becomes  linear  in  N .  We 
will  then  define  a  saliency  measure  $  that  measures  length  and  smoothness,  and  at  the 
same  time  is  extensible  and  can  be  computed  efficiently. 


2.2  Multistage  Optimization  Approach 


For  a  certain  class  of  measures  $(•),  the  computation  of  can  be  obtained  by 
iterating  a  simple  local  computation.  To  illustrate,  let  us  consider  first  cur%res  that  are 
only  three  elements  long.  The  problem  in  this  case  is: 


max 

(p«+i  .P«+a)€*  *(p«) 


$2(P.,P.+l,P,+2) 


That  is,  for  a  given  element  p<,  determine  p1+1  (one  of  p^s  k  neighbors)  and  pl+2  (a 
neighbor  of  p1+i )  such  that  $2(pi>P»+i,P»+2)  will  be  maximal.  A  naive  approach  will 
again  require  examining  the  k 2  different  curves.  Assume,  however,  that  <E>2  satisfies  the 
condition: 

max  $2(Pi,P.+i,P.+2)  =  max $i(p,, max $i(p,+1,p,+2)) 

6*(pi)  P>  + 1  Pi+a 

In  this  case  maximizing  $2  can  be  achieved  by  repeating  the  application  of  $i  over 
shorter  curves.  The  general  approach  is  formulated  in  a  similar  manner: 


max  $jv(p.,...,p.+/v)  =  max  $i(p;,  max  $;v-i(p»+i ,  ...,p,+n))  (2.1) 

6N(pi)  Pi+i€«(p<)  ^-’(Pi+j) 


where  S(pi)  stands  for  S1(pi).  In  this  manner  we  reduce  the  search  space  needed  for 
each  curve  of  length  N  starting  from  p,  to  the  size  of  kN  instead  of  kN  that  is  needed 
for  the  naive  approach.  The  principle  in  (2.1)  is  related  to  the  principle  of  optimality 
underlying  all  multistage  decision  processes,  and  in  particular  it  is  a  special  case  of 
Dynamic  Programming.  We  refer  to  the  family  of  functions  that  obey  the  principle  in 
(2.1)  as  extensible  functions.  We  next  derive  an  extensible  function  that  prefers  long 
curves  that  have  low  total  curvature. 


2.3  Deriving  an  extensible  Funciiou  for  Measuring  Saliency 

We  next  derive  an  expression  for  the  saliency  of  an  element  p*  on  a  curve  7  = 
Pi,...p,+jv.  7  is  a  curve  of  length  N ,  terminating  at  p,.  (For  a  non-end  element.,  the 
saliency  is  the  sum  of  the  contributions  of  the  two  sides.)  Note  that  the  saliency  measure 
is  associated  with  each  element,  not  with  the  entire  curve.  Two  elements  along  the  same 
curve  may  have  different  saliency  measures,  depending  on  their  position. 
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The  saliency  measure  at  p,  developed  below  has  the  general  form  £  ■  where 

j  ranges  over  all  the  elements  lying  on  7.  That  is,  the  saliency  at  p,  is  a  weighted  sum 
of  the  contributions  from  all  the  elements  lying  on  the  same  curve. 

Two  factors  play  a  role  in  the  measure  of  saliency.  The  first  factor  is  related  to  the 
length  of  the  curve,  and  the  second  factor  is  related  to  its  shape.  The  length  of  a  curve 
is  determined  by  the  number  of  elements  on  the  curve  that  have  an  actual  curve  (rather 
than  a  gap)  passing  through  them.  These  elements  are  referred  to  as  active  elements , 
whereas  the  elements  that  are  associated  with  gaps  are  referred  to  as  virtual  elements. 
To  each  element  pi  we  associate  its  local  saliency  cr,.  If  pi  is  an  active  element,  then  <r, 
is  set  to  be  a  positive  value,  which  for  the  present  is  set  to  1,  and  for  a  virtual  element 
<7,  is  set  to  0.  The  measure  related  to  the  length  of  the  curve  pi,  ...,pi+;v  is: 

i+N 

£  (2.2) 

j=« 

The  measure  above  is  a  sum  of  the  local  saliency  values  of  the  active  elements  along 
the  curve.  JD  07  is  in  the  range  of  0  to  N  +  1  depending  on  the  number  of  active 
elements,  implying  that  a  continuous  curve  scores  higher  than  a  fragmented  one  of  the 
same  length.  It  is  also  possible  to  ‘penalize’  the  existence  of  gaps,  especially  large 
ones,  in  order  to  attenuate  the  measure  given  to  the  curve  when  it  is  too  fragmented. 
Penalizing  the  existence  of  gaps  is  obtained  by  associating  an  attenuation  factor  p,  with 
each  element  p*.  The  value  of  p,  determines  how  quickly  the  contribution  to  the  saliency 
from  neighboring  elements  along  the  curve  decays  with  distance.  It  is  reasonable  to  use 
only  to  values  for  p,,  depending  on  whether  p,  is  an  active  or  virtual  element.  If  p,  is 
active  then  p,  is  set  to  a  value  smaller  or  equal  1  (for  the  present  it  is  set  to  1).  If  pi 
is  virtual,  then  pi  =  p  <  1.  We  then  define  an  attenuation  function  associated  with  the 
curve  pt,  ...,pj  as  follows: 

j 

Pi,j  =  FI  pk 

k=i+ 1 

where  p,^  =  1.  The  measure  in  (2.2)  is  modified  by  the  attenuation  factors: 

i  +  N 

5Z  p'o°}  (2-3) 

The  measure  in  (2.3)  is  a  weighted  contribution  of  the  local  saliency  values  a }  along  the 
curve,  where  the  weights  are  inversely  related  to  the  number  of  virtual  elements  along 
Pi,-,Pj- 

In  order  to  measure  the  shape  of  the  curve  we  use  a  measure  that  is  inversely 
related  to  the  total  curvature  of  the  curve.  The  total  curvature  of  a  curve  7  is  defined 
as  J  (j~)  ds,  where  0(s)  is  the  slope  along  the  curve,  and  jL  at  point  P  is  known  as 
the  local  curvature  at  that  point  (the  inverse  of  R,  the  radius  of  curvature).  We  would 
like  to  use  the  total  curvature  to  obtain  a  measure  that  is  bounded,  and  is  inversely 


related  to  the  total  curvature.  The  following  measure  meets  these  requirements: 

/„(£)**  (2.4) 


which  is  confined  to  values  between  0  and  1.  A  straight  line  receives  the  value  1,  and  a 
meandering  curve  will  approach  the  limit  0  as  its  total  curvature  grows  to  infinity.  To 
obtain  a  discrete  approximation  to  the  measure  in  (2.4)  we  denote  by  a*  the  orientation 
difference  between  the  fc’th  element  and  its  successor,  and  by  As  the  length  of  an 
orientation  element.  The  local  curvature  ^  to  the  curve  tangent  to  these  elements  (see 
Fig.  4)  is: 

2  tan 
As 

The  arc’s  length  is  akR,  and  therefore  the  total  curvature  square  is  approximated  by: 

2a*  tan 
As 

The  discrete  approximation  to  the  total  curvature  measure  along  pi,...pj  is  therefore 
obtained  by: 

c«,j = n  /fc.k+i 

k=zi 


where 


fk,k+ 1  =  e - (2.5) 

Cij  plays  the  role  of  a  weight  given  to  each  local  saliency  value  cry  along  the  curve.  A 
measure  that  gives  a  high  score  to  long  curves  with  low  total  curvature  is  now  defined 


Y,  Ci,}Pi,j<T}  (2.6) 

j=> 

The  measure  in  (2.6)  is  a  weighted  contribution  of  the  local  saliency  values  a }  along  the 
curve.  Each  weight  is  a  product  of  two  factors.  The  first  factor  is  inversely  related  to 
the  number  of  virtual  elements  along  pi, ...,  py,  and  the  second  factor  is  inversely  related 
to  the  total  curvature  of  the  curve.  Curves  that  will  receive  a  high  measure  on  (2.6) 
are  long  curves  that  are  as  straight  as  possible  and  have  the  least  number  of  gaps.  The 
measure  in  (2.6)  is  also  extensible  according  to  the  definition  in  (2.1)  This  can  be  shown 
by  induction  on  the  length  of  the  curve,  and  the  proof  will  not  be  detailed  here. 


Other  functions  for  nv  isuring  the  optimality  of  curves,  using  multistage  optimiza¬ 
tion,  were  suggested  by  Ballard  and  Sklansky  (1976).  Martelli  (1976)  and  Montanari 
(1971).  The  optimal  curve  in  these  cases  is  one  that  maximizes  the  sum  of  gray  levels 
or  edge  magnitude  along  the  curve,  while  minimizing  the  sum  of  orientation  difference. 
In  our  terminology,  the  optimization  function  is: 

i+N  i+S 

-  £»> 

y=»  >=• 
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This  measure,  however,  is  insensitive  to  the  distribution  of  orientation  difference  along 
the  curve  and  in  general  does  not  satisfy  the  requirement  to  prefer  long  and  as-straight- 
as-possible  curves. 


AS 


Figure  4 ■  A  discrete  approximation  to  the  curvature.  R  approximates  the  radius 
of  curvature,  a  is  the  orientation  difference,  As  is  the  length  of  both 
elements. 


§3  The  Saliency  Network 

In  this  section  we  summarize  the  computation  performed  by  the  network  and  its 
relation  to  the  saliency  measure  defined  above.  The  orientation  elements  constitute  the 
basic  computing  elements  of  the  net.  Each  element  pi  is  associated  with  a  processor 
that  can  perform  some  computation  based  on  its  state  and  the  state  of  its  k  neighboring 
processors.  This  defines  a  uniform  network  containing  kn 2  processing  units,  with  local 
communication.  In  the  current  implementation  k  is  equal  to  16,  providing  a  reasonable 
angular  resolution. 


3.1  Computation  of  Elements  in  the  Network 

With  each  element  p,  is  associated  a  state  variable  denoted  by  Ep.  and  a  set  of 
three  attributes  that  includes  its  local  saliency  a *,  its  orientation  9,  and  its  attenuation 
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I* 


factor  pi .  Each  element  p,  updates  its  state  variable  EPi  iteratively  through  a  local 
computation.  (We  use  here  the  notation  EPi  to  indicate  explicitly  that  the  variable 
is  associated  with  the  element  p  —  i  in  the  network.)  At  the  end  of  iteration  N,  EPi 
contains  the  measure  of  saliency  derived  in  (2.6)  which  will  be  maximal  over  all  possible 
curves  of  length  N  starting  at  p,,  where  these  curves  are  either  continuous  or  with  any 
number  of  gaps. 

Ep.  is  updated  by  the  following  computation: 

=  * 

(31) 

where  pj  is  one  of  k  possible  neighbors  of  pi,  and  fij  are  the  “coupling  constants” 
defined  in  (2.5).  To  unravel  the  recurrence  formula  above,  we  isolate  a  specified  curve  7 
represented  by  0,, ...,  0,+yv  where  each  element  along  the  curve  has  only  a  single  neigh¬ 
boring  element  to  communicate  with.  The  following  proposition  relates  the  value  of  the 
state  variable  of  p,  with  the  measure  in  (2.6). 

Proposition  1: 

i+N 

=  E  Cijw, 

J=« 

The  proof  is  by  induction  on  the  length  of  the  curve  and  will  not  be  detailed  here. 
The  proposition  above  together  with  the  fact  that  the  measure  is  extensible  implies  that 
among  all  possible  curves  7,  of  length  N  starting  from  p,,  either  continuous  or  with  any 
number  of  gaps,  EPi  will  be  computed  along  that  curve  which  is  maximal  with  respect 
to  the  measure  in  (2.6),  namely 

j 

taken  over  all  7,.  It  is  worth  noting  that  the  fact  that  the  measure  $  is  extensible,  does 
not  imply  that  the  optimal  contour  through  P  simply  extends  itself  as  the  iterations 
proceed.  In  fact,  the  optimal  curve  at  stage  N  +  1  can  be  different  from  the  optimal 
curve  at  stage  N 

The  state  values  of  elements  in  the  network  form  a  new  representation  of  the  image 
which  is  a  ‘biased1  view  of  the  visual  environment,  emphasizing  interesting  or  conspicu¬ 
ous  locations.  We  denote  this  representation  as  the  saliency  map.  The  term  of  saliency 
map  was  used  by  Koch  and  Ullinan  (1986)  for  representing  (using  our  terms!  local 
saliency. 


3.2  Additional  Properties  of  the  Network 
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Convergence  of  the  State  Values 

The  concept  of  an  iterative  computation  raises  the  issue  of  convergence  when  the 
number  of  iterations  goes  to  infinity.  This  issue  is  important  in  the  context  of  the 
saliency  network  because  an  element  p*  might  be  influenced  by  its  own  state  in  a  feedback 
loop  if  it  lies  on  a  closed  curve.  The  following  proposition  considers  a  closed  curve  and 
evaluates  the  state  of  an  element  of  the  curve  after  an  infinite  number  of  iterations. 

Proposition  2: 

Consider  pi,...,p, a  closed  curve  where  p,  =  p,+  /v+i.  The  state  of  p,  converges  to 
the  following  value: 

e(N) 

g(kN)  _ ^ 

p'  k—oo  1  — 

The  proof  is  by  induction  on  the  length  of  the  curve.  The  main  point  to  notice  is 
that  a  closed  curve  (even  if  it  is  fragmented)  will  increase  it.  /alue  when  the  number  of 
iterations  exceeds  the  curve’s  perimeter.  If  we  consider  a  continuous  circle  of  radius  r, 
for  example,  then  Cti,+w  =  e~2?~  which  is  always  less  than  1.  In  practice,  the  increase 
is  considerably  smaller  than  the  limiting  value  because  we  perform  a  restricted  number 
of  iterations. 


■ 
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Tracing  the  Curve  Starting  From  a  Given  Element 

The  computation  performed  by  each  element  includes  a  local  preference  between 
neighboring  elements.  That  is,  at  each  iteration  each  element  p,  selects  the  neighbor  p} 
that  contributes  the  most  to  its  state.  The  information  regarding  local  preference  can 
be  used  to  trace  a  linked  curve  starting  from  p,  in  a  recursive  manner,  namely,  p}  is  the 
second  element  in  the  curve,  p/s  preferred  neighbor  is  the  third  element,  etc.  Given 
a  conspicuous  element  as  a  starting  point,  we  could  extract  the  curve  that  is  optimal 
according  to  (2.6).  Examples  of  these  curves  are  given  in  section  4. 


Filling  Gaps  by  the  Saliency  Network 

The  ability  to  cope  with  gaps  is  important  for  the  applicability  of  the  saliency 
network  to  real  images.  Edge  maps  obtained  from  real  images  are  often  corrupted 
by  multiple  gaps,  and  what  seems  as  a  smooth  salient  curve  often  turns  out  to  be 
fragmented  after  edge  detection  has  been  applied. 

A  virtual  element  (that  lies  in  a  gap)  participates  in  the  computation  of  (3.1)  in  a 
similar  manner  to  active  elements.  Consider  for  instance  a  gap  starting  from  pJ+  j  and 
ending  at  Pj+k.  That  is,  pj  is  an  active  element,  but  pj+i ,  ...,pj+ *  are  virtual  elements. 
An  element  will  update  its  state  provided  that  it  has  at  least  one  neighbor  with  a  state 
value  different  from  0.  It  will  take  at  most  A-  iterations  for  pj+, t  to  update  its  state. 
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The  network  will  fill-in  a  curve  7,  that  will  maximize  the  value  of  p^^C }t}+k-  That  is, 
the  preference  is  for  filled-in  curves  having  low  total  curvature  Cjj+k,  while  minimizing 
their  overall  length  |7,|.  The  relative  weight  of  the  two  factors  is  controlled  by  setting 
the  values  of  p.  In  the  current  implementation  p  was  set  to  0.7,  which  was  found 
experimentally  to  give  results  that  are  generally  in  agreement  with  our  own  perception. 
The  curves  generated  in  this  manner  are  similar  (for  orientation  difference  less  than  j) 
to  several  other  methods  for  completing  gaps  in  contours  and  for  modelling  subjective 
contours  in  human  perception  (Rutkowski  1979;  Ullman  1976;  Webb  and  Pervin  1984). 


3.3  Additional  Computations  Performed  by  the  Network 


Measure  of  Saliency  Based  on  Low  Curvature  Variation 

The  computation  of  the  network  summarized  in  (3.1)  produce  a  saliency  map  based 
on  the  measure  in  (2.6).  This  does  not  rule  out  the  possibility  of  additional  properties 
that  mediate  structural  saliency.  For  instance,  the  blobs  in  Fig.  1  seem  to  be  prominent 
on  the  basis  of  low  curvature  variation  rather  than  low  overall  curvature.  A  second 
saliency  measure  was  therefore  formulated  that  prefers  long  curves  with  low  total  cur¬ 
vature  variation.  Details  of  this  second  measure  can  be  found  in  (Sha’ashua  1988).  As 
a  result,  the  saliency  network  constructs  two  saliency  maps,  one  for  each  property,  from 
which  salient  locations  can  be  detected. 


Smoothing  the  Measured  Curves 

The  input  to  the  saliency  network  is  an  edge  map  that  determines  which  of  the 
network’s  elements  are  active.  The  edges  in  the  edge  map  are  often  noisy,  due  to  sensor  J 

noise,  quantization  effects,  and  various  effects  of  the  edge  detection  process.  Reducing 
noise  is  important  because  what  appears  to  be  a  smooth  curve  to  our  visual  system  may 
turn  out  to  be  rather  serrated  at  the  edge  map  level.  Smoothing  can  be  obtained  in 
part  by  analyzing  the  same  image  at  different  resolutions.  It  turns  out,  however,  that 
some  smoothiiig  is  often  desired  within  a  given  scale  of  analysis.  ^ 

A  naive  approach  would  be  to  extract  all  curves,  replace  them  by  a  smooth  approx¬ 
imation  and  then  apply  the  saliency  network  to  the  smoothed  curves.  However,  such 
an  approach  will  encounter  the  same  complexity  issue  regarding  the  number  of  possible 
curves  discussed  in  section  1.1.  We  handle  the  problem  of  smoothing  curves  as  a  local  I 

computation  that  is  performed  within  the  saliency  network  itself,  as  an  integral  part 
of  computing  the  saliency  measure.  In  a  nutshell,  the  coordinates  associated  with  each 
orientation  element  are  modified  in  an  iterative  manner,  to  smooth  the  curve  passing 
through  that  element.  The  approach  underlying  the  computation  is  to  associate  an  en- 
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ergy  level  to  each  curve  so  that  the  smooth  approximation  is  of  minimum  energy.  The 
energy  functional  is  given  by: 

e(7)=  “A  ((xy  -  ^0))2  -h  (y>  -  yj0))2)  +  i/(0)  ds 

where  (x/,  y>)  j  =  t,  ...,t  +  N  are  the  coordinates  of  the  smooth  approximation  to  the 
curve  (x°,y®)  j  =  i,  ...,i  +  N.  A  curve  of  minimum  energy  is  one  that  minimizes  its 
total  curvature  variation  while  being  as  close  as  possible  to  the  original  curve.  The 
parameter  A  controls  the  relative  weight  between  the  two  terms  (for  a  similar  energy 
functional  see  Poggio  ei  al.  (1985)).  The  energy  is  lowered  at  each  iteration  in  a  process 
that  involves  only  local  computations.  These  local  computations  are  combined  with 
those  in  (3.1),  resulting  a  network  which  measures  saliency  of  curves  while  smoothing 
them  simultaneously.  The  details  can  be  found  in  (Sha’ashua  1988). 

§4  Examples  of  the  Saliency  Network  at  Work 

The  main  issues  illustrated  by  the  examples  are  (i)  the  saliency  map,  and  (ii)  the 
by-product  creation  of  linked  curves,  which  is  a  by-product  of  the  saliency  computation. 

Prominent  locations  in  the  image  are  represented  as  elements  having  a  high  measure 
of  saliency  as  computed  by  the  network.  For  illustration  purposes  the  saliency  map  will 
be  displayed  as  a  gray-level  image  in  which  an  element  pi  is  displayed  as  a  bar  of  width 
u)i  and  intensity  value  7\,  given  by: 

Ti  =  - ?Ii - 255  u>,  =  f  —4] 

'  [maxp>  EPi  J  *  1 255  I 

In  other  words,  increased  saliency  measure  corresponds  to  an  increase  in  brightness  and 
in  width  of  the  element  in  the  display.  The  most  salient  element  is  displayed  as  a  white 
bar  of  width  four,  and  the  least  salient  element  is  displayed  as  a  black  segment. 

The  first  example  is  a  synthetic  image  (not  produced  by  edge  detection)  shown 
in  Fig.  2.  It  is  constructed  from  a  fragmented  circle  placed  among  a  background  of 
randomly  placed  and  oriented  elements.  The  number  of  background  elements  is  200 
and  the  circle  consists  of  60  elements.  The  circle  is  immediately  perceived  by  our  visual 
system.  The  saliency  network  is  applied  to  this  image  for  ten  iterations.  Fig.  5  presents 
the  saliency  map  after  that  period,  and  Fig.  6  presents  the  selected  curve  starting  from 
the  most  salient  element.  The  result  is  in  agreement  with  the  perception  of  the  circle 
by  our  visual  system.  The  saliency  measure  of  each  element  of  the  circle  is  significantly 
higher  than  the  measure  given  to  the  background  elements.  In  this  regard,  the  circle 
virtually  ‘pops-out’  from  the  saliency  map. 

The  second  point  to  notice  is  that  a  complete  object  is  separated  from  the  back¬ 
ground  although  it  is  initially  fragmented.  This  agrees  with  the  observation  that  per¬ 
ception  is  not  severely  affected  by  the  presence  of  gaps.  The  final  point  to  notice  is 
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that  although  the  length  of  the  salient  curve  is  60  elements,  the  number  of  iterations 
required  for  distinguishing  the  circle  from  its  background  is  considerably  smaller.  This 
happens  because  although  each  element  of  the  circle  is  not  salient  by  itself,  groups  of 
ten  elements  already  become  sufficiently  salient.  Outside  the  circle,  the  probability  of 
having  a  low  curvature  chain  of  length  ten  is  low.  In  fact,  the  probability  remains  small 
even  when  the  number  of  background  elements  increases  considerably.  To  illustrate, 
we  doubled  the  number  of  background  elements  as  shown  in  Fig.  7.  We  applied  again 
ten  iterations  to  produce  the  saliency  map  in  Fig.  8.  Starting  from  the  most  salient 
element,  the  curve  extracted  by  the  network  is  identical  with  the  one  in  Fig.  6. 

The  next  example  is  the  image  in  Fig.  3.  Fig.  9  shows  the  saliency  map  after 
30  iterations.  Only  the  region  surrounding  the  car  is  displayed.  The  saliency  measure 
given  to  most  of  the  elements  of  the  car  is  significantly  higher  than  that  given  to  the 
background  elements.  Fig.  10  displays  the  five  most  salient  curves  obtained  by  tracing 
the  most  salient  elements.  Note  that  the  traced  curves  have  been  smoothed,  and  that 
the  gaps  have  been  filled  in.  The  results  suggest  that  the  saliency  computation  is  useful 
for  distinguishing  significant  structures  in  the  image. 

The  final  example  is  the  image  in  Fig.  la.  The  input  to  the  network  was  obtained 
by  edge  detection  from  the  original  hand-drawn  image.  We  show  the  results  for  a  part 
of  the  image  containing  one  of  the  blobs.  Fig.  11  displays  the  saliency  map  for  low 
curvature  variation  after  160  iterations,  which  is  twice  the  number  of  elements  on  the 
perimeter  of  the  blob.  The  elements  of  the  blob  become  stronger  than  the  background 
elements  after  70  iterations,  in  agreement  with  the  observation  that  one  must  capture 
almost  the  entire  blob  in  order  to  perceive  it  as  prominent.  Interestingly,  the  results  of 
the  low  curvature  map  are  similar,  but  about  100  iterations  are  required  for  the  blob  to 
become  prominent.  Fig.  12  displays  the  curve  starting  from  the  most  salient  element. 
In  this  case  also  the  curve  is  smoothed  by  the  network  while  measuring  its  saliency. 


§5  Summary 


5.1  Brief  Summary  of  the  Scheme 

A  measure  of  saliency  S(P )  is  defined  for  the  edge  elements  in  the  image.  The 
saliency  measure  is  used  for  detecting  globally  salient  structures  in  the  image.  As  a  by¬ 
product,  the  process  fills-in  smoothly  gaps  in  fragmented  contours,  and  provides  linking 
information  between  edge  segments. 

Saliency  of  a  Single  Curve 

Let  7  be  a  curve  and  P  an  end  element  of  the  curve.  The  saliency  of  P  given  7, 
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S-y(P)  is  defined  as: 

S-r(P)  =  ^2  W«£T* 

i 

where  ax  is  the  local  saliency  of  the  i’th  edge  element  along  7,  and  is  the  weight  of 
the  element’s  contribution. 

The  weight  u>j  is  a  product  of  two  factors.  The  first  factor  is 

e~c' 

where  cx  is  the  total  curvature  of  the  curve  up  to  the  i’th  element.  The  second  factor 
penalizes  the  existence  of  gaps,  and  is  defined  as: 

ri* 

*«  0 

where  pk  is  the  attenuation  factor  of  the  k’th  element  along  the  curve.  One  value  is 
used  for  real  edges,  another  for  gaps  in  the  contour. 

The  Saliency  Measure 

The  measure  in  section  1  depends  on  a  particular  curve  7.  The  saliency  at  P  is 
given  by: 

S(P)  =  max  5^(P)  * 

A  maximum  is  reached  over  all  possible  curves  terminating  at  P.  In  practice,  the 
definition  is  limited  to  curves  of  length  N : 

Sn(P)  =  max  S-i  n(P) 

In 

Note:  the  maximum  is  taken  over  all  possible  curves,  including  fragmented  ones.  As  a 
by-product,  curves  are  being  filled-in. 

Operation  of  the  Network 

£,-0)  =  <n 

Eln+l)  =  at+p,  max  £<">/,,, 

&(i)  are  all  the  neighbors  of  element  i.  The  quantities  p,  and  f,j  (“couplings”)  are 
constants  of  the  network.  The  initial  input  at  step  0  are  the  local  saliencies  .  At  the 
(n-f  1)  iteration,  each  element  simply  adds  the  maximal  contribution  from  its  neighbors 
to  its  own  local  saliency.  After  N  iterations  the  computation  defined  above  computes 
the  saliency  measure  Sh(P)  for  each  element  P. 


5.2  General  Summary 

It  is  proposed  that  immediate  perception  includes  processes  for  detecting  salient 
structures  in  the  image  on  which  subsequent  processes  such  as  segmentation  and  recog¬ 
nition  can  focus.  The  saliency  of  a  structure  is  divided  into  two  sources,  local  saliency. 


15 


and  structural  saliency.  Of  the  two,  structural  saliency  is  more  problematic  from  a 
computational  point  of  view  since  it  requires  the  efficient  computation  of  certain  global 
properties. 

A  locally  connected  network  was  devised  to  produce  a  saliency  map,  which  is  a 
representation  of  the  image  emphasizing  salient  locations.  The  network  exhibits  the 
following  properties:  (i)  the  computations  are  local  and  simple,  (ii)  the  number  of 
computations  are  in  the  order  of  dozens  or  up  to  about  a  hundred,  (iii)  there  is  little 
dependence  on  the  complexity  of  the  image,  (iv)  gaps  in  curves  are  filled  in  the  course 
of  the  computation,  (v)  contours  are  smoothed  in  the  course  of  producing  a  saliency 
map,  (vi)  the  network  produces  linking  information  so  that  curve  tracing  across  junc¬ 
tions,  branches  and  gaps  is  possible,  and  (vii)  the  network  is  robust  in  the  sense  that 
malfunction  of  some  processing  units  does  not  affect  seriously  the  performance  of  the 
network. 

Acknowledgement:  We  thank  E.  Grimson  for  his  comments. 
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Figure  5.  Saliency  map  of  the  image  in  Fig.  2  obtained  by  the  network  after  10  iterations. 
The  saliency  measure  of  each  element  of  the  circle  is  significantly  higher  than  of  the 
background  elements. 

Figure  6.  The  curve  starting  from  the  strongest  element  in  figure  5.  Virtual  elements  are 
displayed  as  dotted  lines. 
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Figure  7.  The  same  circle  as  in  figure  2  but  with  400  background  segments. 

Figure  8.  Saliency  map  of  the  image  in  Fig.  7  obtained  by  the  network  after  10  iterations. 


Figure  9.  Salieticy  map  of  the  image  in  Fig.  3  obtained  by  the  network  after  30  iterations.  The 
region  of  interest  virtually  'p°Ps“ou^’  from  the  display. 

Figure  10.  The  five  most  salient  curves  obtained  by  tracing  the  most  salient  elements  of  figure 
9.  The  curves  have  been  smoothed  and  gaps  have  been  filled  in. 


Figure  11.  Saliencv  map  for  low  curvature  variation  of  the  image  in  Fig.  1 


Figure  12.  The  curve  starting  from  the  strong'st  element  in  figure  11  is  traced.  The  curve  is 
smoothed  by  the  network  while  measuring  its  saliency. 
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